Gut microbiome for predicting immune checkpoint blockade-associated adverse events

Background The impact of the gut microbiome on the initiation and intensity of immune-related adverse events (irAEs) prompted by immune checkpoint inhibitors (ICIs) is widely acknowledged. Nevertheless, there is inconsistency in the gut microbial associations with irAEs reported across various studies. Methods We performed a comprehensive analysis leveraging a dataset that included published microbiome data (n = 317) and in-house generated data from 16S rRNA and shotgun metagenome samples of irAEs (n = 115). We utilized a machine learning-based approach, specifically the Random Forest (RF) algorithm, to construct a microbiome-based classifier capable of distinguishing between non-irAEs and irAEs. Additionally, we conducted a comprehensive analysis, integrating transcriptome and metagenome profiling, to explore potential underlying mechanisms. Results We identified specific microbial species capable of distinguishing between patients experiencing irAEs and non-irAEs. The RF classifier, developed using 14 microbial features, demonstrated robust discriminatory power between non-irAEs and irAEs (AUC = 0.88). Moreover, the predictive score from our classifier exhibited significant discriminative capability for identifying non-irAEs in two independent cohorts. Our functional analysis revealed that the altered microbiome in non-irAEs was characterized by an increased menaquinone biosynthesis, accompanied by elevated expression of rate-limiting enzymes menH and menC. Targeted metabolomics analysis further highlighted a notably higher abundance of menaquinone in the serum of patients who did not develop irAEs compared to the irAEs group. Conclusions Our study underscores the potential of microbial biomarkers for predicting the onset of irAEs and highlights menaquinone, a metabolite derived from the microbiome community, as a possible selective therapeutic agent for modulating the occurrence of irAEs. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-024-01285-9.


Background
Immune checkpoint inhibitors (ICIs), contributing to durable remissions in a subset of cancer patients, have reshaped multiple cancer therapy paradigms over the past several decades [1,2].The traditional ICI drugs, including anti-cytotoxic T lymphocyte-associated antigen 4 (CTLA-4) and other checkpoint inhibitors targeting the programmed death 1(PD-1)/programmed death ligand 1 (PD-L1) pathway, were approved for the treatment of multiple cancers [3][4][5].Although ICIs were confirmed to boost the tremendous clinical benefit of defense against cancer, they might bring adverse effects, collectively termed as immune-related adverse events (irAEs), that represented autoimmune effects and damage on the normal tissues suffering misdirected overactivation of the immune systems [6].Common irAEs encroached on organs involving the skin (e.g., rash), the gastrointestinal tract (e.g., emesis or colitis), lungs (e.g., pneumonitis), heart (e.g., myocarditis), liver (e.g., hepatitis), and endocrine organs (e.g., autoimmune diabetes), etc. [7].These toxicities were important causes of treatment discontinuation and can debilitate long-term clinical responses and life quality [8].Previous evidence demonstrated that patients who received anti-CTLA-4 therapy suffered from a higher incidence of irAEs than ICIs targeting PD-1/PD-L1 [9].Moreover, histologically pathologic features of anti-PD-1/PD-L1 therapy-related colitis were distinct from anti-CTLA-4 therapy [10].These supported the idea that effective biomarkers for irAEs should be specific to the ICI agent used.
Strikingly, patients who experienced irAEs usually had a relatively improved progression-free survival (PFS) and overall survival (OS) compared with those who suffered none of irAEs [11,12].Therefore, the effective prevention and treatment of irAEs is a clinical paradox to maximize the utility of ICIs.Oncologists must weigh the risk of irAE onsets against the benefit of ICIs before prescribing ICI agents.Thus, intensive efforts were encouraged to identify potential biomarkers for the occurrence and severity of irAEs, thereby guiding the rational medications and surveillance strategies for high-risk patients allowing for earlier intervention [13].
Emergent studies have struggled to investigate the potential mechanisms and related strategies to recognize the patients who might be susceptible to suffering from irAEs.However, lack of scientific rigor and reproducibility, in concert with complex laboratory diagnosis and limitative study scale, prevented comprehensive approaches to identify highly effective biomarkers of irAEs from becoming standard practice [14][15][16][17].Accumulating evidence provided insights that the gut microbiota could be leveraged for mounting antitumor efficacy with ICI therapy which might also influence the development of irAEs [18][19][20][21][22][23][24][25].The baseline gut microbiota composition could serve as an important determinant of irAE onset [23,25].However, the present microbial traits on the baseline for discriminating susceptible optimates were inconsistent among studies [19], probably accounted for by the limited study size and discrepant treatment strategy.Accordingly, integrated analysis with more microbiome data was required for identifying effective microbial biomarkers for toxicity prediction.
Recent studies have also proposed gut microbiomebased predictive models for ICI efficacy of specific diseases [26][27][28][29].However, highly effective microbial biomarkers for predicting the development of irAEs using more pooled data have been hardly proposed and are under investigation [13].Developing potential microbial biomarkers for predicting the irAE occurrence probabilities and discovering potential mechanisms driving checkpoint blockade toxicities will have important clinical implications [6].More specific therapeutic strategies could be available for identifying high-risk patients prior to treatment and thus supplying preventive methods to mitigate adverse effects.
Herein, a machine-learning algorithm was employed to uncover potentially predictive gut microbiota associated with irAEs, particularly in patients undergoing treatment with anti-PD-1/PD-L1 drugs.The composition of the fourteen gut microbes at the treatment baseline presents an intriguing and novel investigational avenue for predicting the incidence of irAEs, supported by a robustly validated model performance.Furthermore, our study provides new insights into preventing irAEs and uncovers potential mechanisms from a microbiological functional perspective.
For the further meta-analysis, only three studies on a larger scale with more than 50 patients using anti-PD-1/PD-L1 drugs were included as the training datasets for model construction.Notably, we obtained the metadata of Hakozaki et al. [32] from the supplementary materials of another meta-analysis carried out by Shaikh et al. [36].Only the baseline gut microbiome data defined by the original paper were included for the following analysis, and notably McCulloch et al. used a landmark timepoint to identify the baseline gut microbiome [20].The irAEs status was evaluated using the original metadata and dichotomized to irAEs and non-irAEs, and irAEs were defined once the patients were recorded with any irAEs in the original paper and vice versa.Colon RNA sequencing data and 16S rRNA amplicon sequencing data from an independent cohort carried out by Baruch et al. [37] with accession numbers GSE162436 (https:// www.ncbi.nlm.nih.gov/ biopr oject/?term= GSE16 2436) and PRJNA678737 (https:// www.ncbi.nlm.nih.gov/ biopr oject/ PRJNA 678737) were utilized to speculate the potential mechanism for microbial traits of the model [37].

Patient recruitment and sample collection
Baseline stool samples were collected from patients with pan-cancer who were initiated with anti-PD-1/PD-L1 drugs from Renji Hospital, Shanghai Jiao Tong University School of Medicine (SH Cohort, N = 65) and Xuzhou Central Hospital (JS Cohort, N = 50) (Table 1, Additional file 1: Table S3).Patient recruitment and sample collection were approved by the Medical Ethics Committee of Renji Hospital, Shanghai Jiao Tong University School of Medicine (ID: LY2020-067-B) and Xuzhou Central Hospital of Xuzhou Medical University (ID:XZXY-LJ-20200110-090). Written informed consent was obtained from each participant.This study protocol is in accordance with the approved World Medical Association Declaration of Helsinki (2008) and the Belmont Report.Treatment responses were evaluated through the Response Evaluation Criteria in Solid Tumors 1.1 (RECIST 1.1) [38] as determined in the original studies.Patients with complete response, partial response, or/ and stable disease lasting more than 6 months, according to RECIST 1.1 criteria, were classified as responders, whereas patients with progressive disease or stable disease lasting less than 6 months were classified as nonresponders.Immunotherapy-related adverse events were evaluated and identified retrospectively based on the Common Terminology Criteria for Adverse Events (CTCAE), version 5.0.Patients were recruited for initial diagnosis and had never received any ICI treatment before fecal sample collection.The stool was collected in fecal collection tubes and was stored at − 80 °C.We collected fresh blood samples from two groups of patients: those with confirmed irAEs (N = 10) and those without irAEs (N = 10) who had undergone anti-PD-1/PD-L1 blockade therapy.Serum was extracted by centrifugation at 3000 rpm for 10 min and stored at − 80 °C.

Table 1 Clinical features of the studies employed in this investigation
Note: Each study is indicated by first author and year of publication.a Studies utilized for reanalyzing the differential microbes between irAEs and non-irAEs.b Studies utilized for constructing predictive microbial Random Forest classifier for anti-PD-1/PD-L1-associated irAEs.c Studies utilized as external validation.PD-1 programmed death 1, PD-L1 programmed death ligand 1, CTLA-4 cytotoxic T lymphocyte-associated antigen 4, NE/-not evaluated/not available

Data preprocessing
The 16S rRNA sequencing data were analyzed using QIIME2 (V.2022.2) [39], a plugin-based platform for microbiome analysis.DADA2 (V.2022.1)[40] software, wrapped in QIIME2, was used to filter out sequencing reads with quality score Q > 25 and denoise reads into ASVs, resulting in feature tables and representative sequences.Taxonomy classification was assigned based on the naïve Bayes classifier using the classify-sklearn package against the RDP Classifier_16S_V18 reference sequences [41].ASVs that could not be precisely annotated to species were reassigned to ones having the most similar sequences in the same genus (or family) using NCBI Blast.Subsequently, representative sequences were aligned and merged within the same annotation on species using the function tax_glom() in phyloseq R package.
Then, the feature tables were converted to relative abundance tables.
Raw FASTQ reads of shotgun metagenomic data underwent quality filtering using fastp with default parameters.Using mOTUs3 (Marker gene-based operational taxonomic units) [42] with default parameters, we profiled the microbial composition for each sample and counted the number of reads mapping to given phylogenetic genes.Each marker gene is given a specific operational taxonomic unit (OTU) acting as resolutive microbes.The output files were loaded into R and packaged into a phyloseq object for ease of analysis.The relative abundance values were calculated and converted for the following analysis.

Confounder analysis and covariate evaluation
We employed an ANOVA-like analysis referring to a previous study [27] to assess the impact of potential confounding variables and the presence of a disease.The total variance of a specific amplicon sequence variant (ASV) was compared to the variance explained by irAEs status (irAEs and non-irAEs) and the variance by confounding factors (age, BMI, sex, antibiotics, drug types, tumor staging, patients PD-L1 status, and study).
Variance calculations were performed on ranks to account for non-Gaussian distribution of microbiome abundance data.Potential confounding factors with continuous values were transformed into discrete variables either as quartiles or in the case of BMI as groups of lean (> 25), overweight (25)(26)(27)(28)(29)(30), and obese (> 30) based on conventional cutoffs, PD-L1 status were transformed into discrete variables with the cutoff value of 50%.To assess whether different tumor types influence the relationship between gut bacteria and irAEs, we employed a method known as aPCoA (covariate-adjusted principal coordinates analysis) [44].

Meta-analysis of important abundant species
Considering the "study" accounted for the largest confounder, we used the MMUPHin (Meta-analysis Methods with a Uniform Pipeline for Heterogeneity in microbiome studies) [45] for microbial community batch correction on the confounder factor "study, " a new algorithm extending the batch correction method developed for gene expression data in ComBat with an additional component to allow for the zero-inflated nature of microbial abundance data.The significance of differential abundance was tested on a single ASV using a two-sided blocked Wilcoxon rank-sum test implemented in the R (V.4.1.2) "coin" package (To identify more potential microbial biomarkers, P values < 0.1 were considered to bring into the further analysis).Confounder with a high variance explanation was also defined as a block to adjust the batch effects in the differential analysis.

Model construction and feature selection
The integrated anti-PD-1/PD-L1 cohort was utilized as the training subsample (n = 190), for which we developed the prediction algorithm.The features used for model building consist of important differential features (P < 0.1) as well as patient metadata features including age and sex.For the sake of incomplete data on the BMI, tumor stage, etc., we did not integrate other metadata for model construction.
Subsequently, the Random Forest (RF) models were built with 501 estimator trees and each tree had 10% of the total features.Then, an Iterative Feature Elimination (IFE) step was used to filter features and optimize the performance of subsequent RF models.The top features from the top-performing model were selected as "important features".The permutation-based importance (function Permutation Importance) from the ELI5 Python package (https:// eli5.readt hedocs.io.) was finally utilized to compute the feature importance for models.
We used abundance profiles including the most important abundant microbial species and assessed how well classifiers trained in cross-validation on one study generalize in evaluations on the other two studies (termed as study-to-study transfer of classifier).And we also further assessed if including data from all but one study in model training improves prediction on the remaining hold-out study (also termed as leave-onestudy-out (LOSO) validation).

Model evaluation and external validation
Using the important microbiome features, we built RF Classifiers in the scikit-learn (V.0.19.2) package with stratified tenfold cross-validation to distinguish the patients with irAEs or non-irAEs.The receiver operating characteristic (ROC) curves and the area under the curve (AUC) were performed for model performance evaluation.To calculate the probability of irAEs onsets, we developed a robust score for the RF classifier with input microbial features (hereafter called RF score).The RF score for both the training and the validation datasets were calculated using function predict_proba() in the scikit-learn (V.0.19.2) package.The optimal thresholds of the RF score, which discriminate irAEs and non-irAEs, were computed by using Youden's index method in the training set with the pROC package.The higher RF score indicated a lower probability for patients developing irAEs.Model predictive performance was measured by multi-metrics including sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).Model specificity evaluation was conducted between irAEs and response using the Bootstrap Hypothesis testing.
Two additional cohorts, including the Shanghai Cohort (SH cohort, N = 65) and Jiangsu Cohort (JS cohort, N = 50) were used as independent cohorts for validation.The clinical information is demonstrated in Additional file 1: Table S3.RF scores for the validation datasets were calculated using the function predict_proba () in the scikit-learn (V.0.19.2) package.Then ROC curves and the AUC value for model performance evaluation were conducted using the pROC function.

Functional profile analysis
To investigate the potential mechanisms of the functions of our model species, PICRUSt2 (V.2.0.3) as previously published was utilized for predicting microbiome genome and function inferred from 16S rRNA sequences [46].We collected the output results of PICRUSt2 predictions based on several gene family databases by default, including KEGG orthologs (KOs), Enzyme Commission numbers (EC numbers), and MetaCyc pathway.
The HMP Unified Metabolic Analysis Network 2 (HUMAnN2) tool [47] was further utilized for profiling the microbial function from shotgun metagenomic data.Genes/proteins and pathways abundances of the microbiota between samples were quantified using UniRef50 gene clusters in conjunction with MetaCyc databases.

DNA extraction, 16S rRNA gene sequencing, and shotgun metagenome sequencing
DNA was extracted from fecal samples using HiPure Stool DNA Mini Kit (Magen, D314103, China) following the manufacturer's instructions.The full-length primer sequences designed for amplifying the V3-V4 hypervariable region of 16S rRNA gene including Illumina adaptors were as follows: forward: CCT ACG GGNGGC WGC AG, and reverse: GAC TAC HVGGG TAT CTA ATC C. Microbial genomic DNA was used to start the polymerase chain reaction (PCR) protocol.A volume of 1 μL of the PCR product was quantified using a fluorometer (Qubit, Invitrogen).After size verification, the libraries were sequenced using a 2 × 250 pb paired-end run on a MiSeq sequencer, according to the manufacturer's instructions (Illumina).
DNA for shotgun metagenome sequencing was extracted and quantified as above.DNA was then sheared to the desired insert size, and products were brought to 50 μL using 1X VAHTS DNA Clean Beads.After endrepreparation and adapter ligation, libraries are generated and run through 10-15 cycles of PCR with KAPA Hyper Prep Kit (Roche Sequencing Solutions, Pleasanton, CA).The Qubit Fluorometer and the Agilent 2100 Bioanalyzer were used for library quantification.Finally, libraries were sequenced on the Illumina Nova6000 on a 2 × 150 bp paired-end run.

qRT-PCR validation
Real-time qPCR to quantify the abundance and expression of the differential gene menH and menC was performed on a subset of samples in the Jiangsu Cohort (17non-irAEs and 11 irAEs).To quantify menC and menH genes relative to the total bacterial RNA/DNA in a sample, qPCR was performed in triplicates for the 16S rRNA and menC and menH genes, respectively.We utilized primers based on a previous study [29] for menC and menH, along with standard primers for universal eubacteria 16S as in prior research [48].Real-time PCR reactions were prepared with a final primer concentration of 0.5 μM in a 20-μl final reaction volume and then were performed with a SYBR Green qPCR Mix on a Ste-pOnePlus real-time PCR system (Applied Biosystems).Cycling conditions were performed as described in the protocol.− ΔCt values were calculated as the difference between menC or menH and 16S Ct values.The significance of the comparison between irAEs and non-irAEs samples was tested on the − ΔCt values using the Wilcoxon test as confirmation of metagenomic enrichment.

Bacteria culture and supernatant collection
Parabacteroides merdae (PM, BNCC358131) and Lactobacillus Salivarius (LS, BNCC194719) were purchased from BeNa Culture Collection (BNCC, Shanghai, China).PM was cultured overnight at 37 °C under anaerobic conditions (DG250, Don Whitley Scientific, West Yorkshire, UK) in brain heart infusion (BHI) broth with the protocol.The LS was cultured in deMan Rogosa Sharpe (MRS) medium overnight at 37 °C in shake cultivation at 220 rpm/min.Bacteria were cultured until OD 600 1.0-1.2 and cultures were collected and stored as 0.5-1-ml aliquots at − 80 °C until used for the quantification of menaquinone.

High-performance liquid chromatography analysis
The HPLC-MS/MS analysis followed the methodology described in previous studies [49,50].Separation was accomplished on the Zorbax Eclipse Plus C18 column (2.1 × 50 mm, 1.8 µm, 600 bar).Elution was carried out with the following solvent system: 0.1% formic acid in water (Mobile Phase A) and acetonitrile plus 0.1% formic acid (Mobile Phase B).The flow rate was set at 0.5 ml/ min, employing an isocratic elution program with Pump A at 0% and Pump B at 100% for 10 min.Sample injection was carried out using an automatic sampler with a sample tray temperature maintained at 4 °C, and 20 μL of the sample was utilized for analysis.For quantifying menaquinone, Menaquinone-6 (MK-6) (Macklin, CAS #84-81-1) was employed as the standard.

Characteristics of studies involved in analyzing irAEs-associated gut microbiomes
Based on the availability of baseline16S rRNA sequencing data and detailed population clinical information, we finally included seven studies for the following analysis (Table 1).Totally, we collected 218 patients receiving anti-PD-1/PD-L1 immunotherapy from 4 studies, 60 patients receiving anti-CTLA-4 immunotherapy from 2 studies, and 39 patients receiving combined anti-PD-1/ PD-L1 and anti-CTLA-4 immunotherapy (Fig. 1A).We also observed there is no statistically significant association between adverse reaction outcomes and tumor types (P-value = 0.1523, Additional file 2: Fig. S1B).Subsequently, we waived the studies conducted by Chau et al. for the anti-PD-1/PD-L1-associated irAEs model construction for the sake of its small scale (n = 23).The detailed demographic information and study characteristics are demonstrated in Table 1 and Additional file 1: Table .S2.Similar to other classic sequencing data from human gut microbiota [51], Firmicutes and Bacteroidetes represented the two most dominant bacterial phyla of the total community among all studies.However, the relative abundance of microbiomes in the phylum level seemed to vary from different studies and irAEs status.Previous studies reported an elevated presence of Firmicutes was linked to a greater likelihood of experiencing irAEs, while the phyla Bacteroidetes and Proteobacteria tended to be more abundant in individuals without irAEs [25,52].In this study, despite no significant statistical difference, we found that Firmicutes appeared to be higher in patients with irAEs (median value, 0.316 vs. 0.273) while Bacteroidetes seemed to be lower in patients with irAEs (median value, 0.151 vs. 0.229) as compared to non-irAEs (Fig. 1B  and C).

Potential confounders adjustment and baseline gut microbial composition evaluation for distinguishing anti-PD-1/PD-L1 immunotherapy-associated irAEs and non-irAEs
To create a gut microbiome signature for predicting the occurrence of irAEs, we finally incorporated three studies with anti-PD-1/PD-L1 immunotherapy for the metaanalysis in consideration of the integrity of demographic information and sample size.Since both technical and biological confounders might exist in different studies, we calculated the variances explained by irAEs status and other clinical variates for each species to quantify the potential confounder effects (See Supplementary Materials and Methods).Remarkably, the variance of species explained by "Study" was found to be more predominant than other confounders (Fig. 2A, Additional file 2: Fig. S2).On the phylum level, Firmicutes and Bacteroidetes, accounting for the top two of the most predominant phyla, showed more variation in their ratios among studies (Fg.2B).Zhang et al. 's study demonstrated a higher level of Bacteroidetes while Firmicutes is mostly composed in the other two studies.Additionally, both the alpha diversity and beta diversity varied among different studies (P < 0.001, Fig. 2D, and Additional file 2: Fig. S1.C-D).All above indicated that the factor "study" brought a great impact on gut microbial composition.Therefore, we treated "study" as a blocking factor for the adjustment of the batch effect in the further analysis and a two-sided blocked Wilcoxon rank-sum test was utilized to test the significance between non-irAEs and irAEs.
The baseline gut microbiome of irAEs was demonstrated to be distinct from non-irAEs.The alpha diversity indexes including Fisher alpha and Richness were significantly higher in non-irAEs (P < 0.01, Fig. 2C).The Shannon index and Simpson index had also a higher trend in the non-irAEs group.In addition, the beta diversity calculated using Bray-Curtis distance was found to be significantly different in irAEs and non-irAEs after pooling the data sets (P = 0.001, PERMANOVA, Fig. 2D), whereas no significant differences were found in other clinical factors such as age and gender (P > 0.05, PERMANOVA, Additional file 2: Fig. S2B-C).This indicated the baseline gut microbiome constitutes were different in irAEs and non-irAEs.Convincingly, a set of differential microbes more significantly explained by "irAEs status" could be identified.
To assess whether different tumor types influence the relationship between gut bacteria and irAEs, we initially presented the PCoA plot without considering tumor types.Subsequently, we applied the aPCoA algorithm to correct for tumor types as covariates.Our findings suggest that there was no substantial alteration in the PCoA plots following correction for tumor types.Furthermore, the extent of variation explained by the PCoA experienced only minimal changes before and after tumor type correction (Additional file 2: Fig. S1E-F).This indicated that gut microbes might associate with the irAEs regardless of tumor type.

Identification of irAEs-specific species and model construction for anti-PD-1/PD-L1 immunotherapy-associated irAEs
An outstanding question is to search for microbial biomarkers predicting anti-PD-1/PD-L1 immunotherapyassociated irAEs.The model construction pipeline is described in Fig. 3A.Sixty-two microbes were identified as irAEs-associated microbial traits, of which 11 species were also found to correlate with response.In order to For each AUC detailed: "ROC Fold 1 (AUC = 0.89), " "ROC Fold 2 (AUC = 0.93), " "ROC Fold 3 (AUC = 0.97), " "ROC Fold 4 (AUC = 1.00), " "ROC Fold 5 (AUC = 0.94), " "ROC Fold 6 (AUC = 0.97), " "ROC Fold 7 (AUC = 0.91), " "ROC Fold 8 (AUC = 0.57), " "ROC Fold 9 (AUC = 0.71), " "ROC Fold 10 (AUC = 0.95)." D Prediction performance of important features using study-by-study and leave-one-study-out (LOSO) validation.The heatmap shows the area under the receiver operating characteristic curve (AUROC) from cross-validations within each study (blue boxes along the diagonal) and study-to-study model transfer (external-validations off-diagonal).The last column shows the average AUROC for study-to-study predictions.construct more specific microbial traits model for irAEs, the intersection of response and irAEs-associated microbial traits were subsequently ruled out, and ultimately 51 important microbial features were regarded as irAEspecific species for the following feature filtering (Additional file 1: Table.S6).In total, 14 microbial features were filtered to show the best average AUC as well as the predominantly discriminatory power for identifying irAEs and non-irAEs after IFE (Additional file 2: Fig. S3A-O).A robust RF model was eventually constructed with a core set of best features, including 14 differential microbes (hereafter termed as RF14 classifier), which achieved an average AUC of 0.88 for distinguishing non-irAEs from irAEs (Fig. 3C, Additional file 1: Table.S7).Most of the species involved in the model were significantly enriched in the non-irAEs in the integrated data (Additional file 2: Fig. S3).Among these, the ASV assigned as Lactobacillus salivarius was identified as the top highest-ranking biomarker.To test whether the identified microbial features in our RF14 classifier are universal and robust across multiple studies, we performed study-to-study transfer validation and LOSO validation on the entire samples.In our RF14 classifier, the AUC values of studyto-study transfer validation ranged from 0.65 to 0.89, with an average of 0.74 (Fig. 3D).Additionally, the AUC values of LOSO analysis ranged from 0.72 to 0.90 (average AUC = 0.82, Fig. 3D), which was better than those achieved in study-to-study transfer validation owing to using a larger amount of training data (Fig. 3D).All above demonstrated that our microbial-derived biomarker panel had excellent accuracy across studies.

Machine learning-based microbiome model performance assessment and external validation
We used a different number of input features, including all features, differential features, all important features, and top features (feature ranks after calculating the permutation-based importance), to test the predictive capability.The average AUC values were consistently calculated and compared in study-to-study transfer validation, LOSO validation, and the integrated ten-fold Random Forest analysis, respectively.Obviously, the set of 14 microbes achieved higher predictive performance compared to other number of features used in all evaluation methods (Fig. 4A).The RF score derived from our RF14 classifier for non-irAEs was significantly higher than irAEs (Fig. 4B, Wilcoxon rank-sum test).We further calculated the predictive performance of the gut microbiome-based model as measured by sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).The RF14 classifier showed high predictive performance as measured by sensitivity, specificity, accuracy, PPV, and NPV (Fig. 4D and Additional file 2: Fig. S4A).Moreover, as expected, our RF14 classifier demonstrated significant specificity for predicting the occurrence of irAEs instead of ICIs efficacy (Fig. 4D, P = 2.137e − 14, Bootstrap Hypothesis Test).
To further validate our RF14 classifier for irAEs prediction, two additional independent cohorts, SH cohort (N = 65) and JS cohort (N = 50), were incorporated into this study.SH cohort consisted of 23 irAEs and 42 non-irAEs patients using 16S rRNA amplicon sequencing strategy for gut microbiome analysis.And JS cohort, consisting of 16 irAEs and 34 non-irAEs, adopted shotgun metagenome sequencing strategy.We calculated the relative RF score for the validation cohorts and used the precision-recall (PR) curves and AUC value under the ROC curve for model evaluation.Both the SH cohort and JS cohort achieved reasonable performance for distinguishing non-irAEs from irAEs, with relatively superior model metrics (Fig. 5A-F).This indicated that the gut microbiome identified in our RF14 classifier possessed robustness for discriminating non-irAEs from irAEs.
Meanwhile, using the same model construction methods, we reconstructed RF models for the small-scale dataset from Chau et al. and identified five important species achieving the best AUC value of 0.95 (Additional file 2: Fig. S4E).Notably, the important biomarkers were 3 out of 5 included in our discovery RF14 classifier.Additionally, the features' ranks were consistent with our RF14 classifiers.For instance, ASVs assigned as Lactobacillus salivarius and Parabacteroides goldsteinii were also confirmed as the top-ranking biomarkers in our RF14 classifier (Additional file 1: Table.S7).Our research results indicate that the model constructed with features selected at P < 0.05 ultimately included 18 microbial features and achieved a maximum area under the curve (AUC) of 0.83 (Additional file 2: Fig. S4F).

Microbial function exploration and validation
We further examined the microbiome-based functional alterations using the 16S rRNA data from training datasets.There were 110 differential pathways between irAEs and non-irAEs based on the results of MetaCyc pathway abundances calculated in PICRUSt2 (Additional file 1: Table.S8).Differential pathways with the FDR less than 0.005 were filtered out and clustered based on their abundance (Fig. 6A).
Notably, the menaquinone (also termed as vitamin K2) biosynthesis was significantly enriched in non-irAEs compared with irAEs (Fig. 6A, Additional file 1: Table.S8).The biosynthesis of 1,4-dihydroxy-2-naphthoate, an important intermediate of menaquinone biosynthesis, was also found to be enriched in non-irAEs (Additional file 1: Table.S8).We further calculated Spearman's correlation between the menaquinone-related pathways and the abundance of species involved in our RF14 classifier.Interestingly, the menaquinone biosynthesis significantly correlated to the species that enriched in non-irAEs (Fig. 6B), which was also consistent when performing PICRUSt2 analysis on Chaput's and Dubin's cohorts (Additional file 2: Fig. S5 C-H).Moreover, 420 differential limited enzymes were identified from the EC metagenome prediction (Additional file 1: Table.S9).Consistently, we noticed that several rate-limiting enzymes in the biosynthesis of menaquinone, including menH (EC:4.2.99.20) and menC (EC:4.2.1.113),were significantly elevated in non-irAEs compared with that of irAEs in pooled data sets.These results were also confirmed by qPCR with our newly collected fecal samples (Fig. 6C, D), showing that menH and menC genes were significantly increased in the non-irAEs samples than those in the irAEs samples.Collectively, the declining menaquinone biosynthesis and its key genes from gut microbes were more related to the higher incidence of irAEs.Since shotgun metagenomic sequencing data allow for a more accurate analysis of the microbial function, we further used additional shotgun metagenomics data derived from JS Cohort (N = 50) to examine the alteration of metabolic pathways and orthologous gene families between patients with irAEs and non-irAEs.Consistently, the expression of microbial genes menH and menC were significantly upregulated in non-irAEs (Fig. 6E, F).
In addition, pathway analysis demonstrated the abundance of related pathways referring to the biosynthesis of menaquinone was significantly enriched in non-irAEs (Fig. 6G-N).All results indicated that microbiomederived menaquinone might serve as potential functional microbial metabolites for defending the occurrence of irAEs.

Quantification analysis on menaquinone with HPLC-MS
We performed HPLC-MS analysis of menaquinone in bacterial metabolites and serum samples collected  from individuals both with and without irAEs.Notably, HPLC-MS/MS results confirmed menaquinone production in bacterial supernatants, particularly from two representative microorganisms, Parabacteroides merdae and Lactobacillus salivarius, when compared to control media (refer to Additional file 2: Fig. S5C-F).This highlights the potential of these species for menaquinone biosynthesis.Furthermore, serum menaquinone levels showed a significant increase in individuals without irAEs compared to those with irAEs (P = 0.005, as depicted in Fig. 6O).

Potential mechanism deduction
To further elucidate how the microbial features involved in our RF14 classifier contribute to the development of irAEs, we used an extra study, consisting of 9 melanoma patients who received anti-PD-1/PD-L1 treatment with both fecal 16S rRNA amplicon sequence and bulk RNA sequencing on the normal intestinal or colorectal mucosa available.The nine patients were then stratified by the RF14 classifier into RF score-high and score-low groups via the median value.Notably, the inflammation-associated gene, CCL21, was significantly enriched in the RF score-low group (Additional file 1: Table.S10; Additional file 2: Fig. S5A).Interestingly, several pro-inflammation pathways including the chemokine signaling pathway and NF-κB signaling pathway were significantly elevated in the score-low group utilizing Gene Set Enrichment Analysis (GSEA) (Additional file 2: Fig. S5 B, Additional file 1: Table.S11).Remarkably, several reports had demonstrated that NF-κB signaling pathway could be inhibited by menaquinone, resulting in anti-inflammation effects [54,55].
Taken together, these results indicated that the metabolites derived from the gut microbiome like menaquinone might exert potential protective effects via inhibiting the pro-inflammation signaling pathway such as NF-κB signaling pathway.Our data supported the idea that supplemented menaquinone during the anti-PD-1/ PD-L1 immunotherapy might mitigate or prevent the occurrence of irAEs.

Discussion
ICI drugs elicit the destruction of cancer cells by reliving inhibitory T cell signaling, while tipping the balance between the normal tissue and immune system as well.Predictive strategies that define the risk of irAEs are essential for optimizing ICI use, or alternatively for redirecting patients towards safer therapeutic modalities.Heretofore, growing evidence has demonstrated cytokines potentiate the development of irAEs, and cytokine-targeted therapies have been established for the long-term alleviation of irAEs and brought into clinical use for curbing severe irAEs [56].However, the preventive strategy tailored for irAEs remains a challenge since "one size fits all" is not adequate for this setting.Gut microbiome and their metabolites have demonstrated effective synergistic antitumor response with ICI therapy and alleviate the toxicity induced by ICI drugs.
Data from the preclinical models and clinical observations indicated that the gut microbiome processed great potential in modulating the development of irAEs [21].Pdcd1 − / − mice, which lack the inhibitory receptor Pdcd-1 and are usually utilized as a model mimicking the systematic function of anti-PD-1 drugs [57,58], demonstrated obvious composition shift and diversity reduction on the gut microbiota, especially a dramatic reduction of Lactobacillales [59].This was consistent with the microbial traits identified in our model, where we found the top-ranking species Lactobacillus salivarius for predicting irAEs, and exerts the strongest relevance with menaquinone abundance.In addition, exogenous supplements of probiotics such as Bifidobacterium and Lactobacillus were also confirmed to ameliorate the DSS plus anti-CTLA-4 colitis in murine model [60,61].Parabacteroides genus and Ruminococcus genus (See figure on next page.)Fig. 6 Microbial functional alterations in irAEs and non-irAEs.A Differentially abundant pathways were plotted; P values were computed using a two-sided blocked Wilcoxon rank-sum test and the FDR < 0.005 were presented in the heatmap.B The correlation between the abundance of menaquinone biosynthesis-related pathway and model species.Spearman's correlation between the abundance of menaquinone biosynthesis and 14 representative microbial species in the classifiers and edge width corresponds to the Spearman's r statistic and edge color denotes the statistical significance.r, Spearman correlation coefficient; with a color-gradient denoting Spearman's correlation coefficients, and the exact values were described in heatmap frames.C, D Plotted values are qRT-PCR quantifications of bacterial genes in menaquinone biosynthesis.The abundances of menC (C) and menH (D) were compared between non-irAEs (n = 17) and irAEs (n = 11) groups.All boxes extend from 25 to 75th percentiles and whiskers show the minimum and maximum values.Lines at the middle of each box show the median.P values were computed using a two-sided Wilcoxon rank-sum test.E-M Shotgun metagenome functional validation (N = 50) for differential genes menC (E), menH (F), and menaquinone biosynthesis pathway (G-M).All boxes extend from 25 to 75th percentiles and whiskers show the minimum and maximum values.Lines at the middle of each box show the median.P values were computed using a two-sided Wilcoxon rank-sum test.O Blood concentration comparison of menaquinone-6 (MK-6) between patients with irAEs( N = 10) and patients without irAEs(N = 10).Statistical significance was assessed using a two-sided Wilcoxon rank-sum test were also deciphered to be probiotics for the remission of inflammation and disease development [62].Of note, two species of Parabacteroides genus (including Parabacteroides goldsteinii and Parabacteroides merdae), as well as two Ruminocuccus genera (Ruminocuccus bromii and Ruminocuccus champanellensis), were also identified as important microbial traits for predicting irAEs onset.
Collectively, the microbial traits identified in our model might harbor promising function influencing the development of irAEs whereas more evidence was needed.
Overall, we utilized a robust machine-learning method to identify a panel of gut microbiome for predicting the occurrence irAEs from pooled data regardless of the various confounders.The baseline gut microbiome could function as effectively predictive biomarkers for irAEs with high sensitivity and specificity.External amplicon sequencing data and shotgun metagenome sequencing data were further carried out to validate the robust results.Additionally, from microbial functional analysis, in patients with irAEs, levels of menaquinone biosynthesis and the key enzymes involved in menaquinone synthesis are lower than in non-irAEs.This was similar to the previous findings on melanoma patients treated with anti-CTLA-4 inhibitors, where bacterial metabolism pathways involved in B vitamin biosynthesis were indicated to protect patients from immune-related colitis [35].Based on the above findings, we deduced that regulation of the gut microbiome or the use of gut microbiome-derived metabolites like menaquinone, either prophylactically or concurrently with anti-PD-1/PD-L1 treatment, might harbor the potential to control and prevent the incidence of irAEs.It remains unclear whether reduced menaquinone levels contribute to irAEs onsets or development and further research was in demand.
Moreover, integrated analysis of the gut microbiome and colon tissue bulk RNA sequencing further showcased the potential mechanism for the inhibition of the NF-κB signaling pathway.Similarly, Luoma et al. [63] delineated a significantly elevated TNFα signaling via NF-κB in the colonic macrophage through a comprehensive single-cell analysis of immune cell population in patients with ICIsassociated colitis.The gut microbiome and its metabolites might interact with the host genome and restore the host immune cells, causing inflammation activation.Accordingly, more rationales were required to interpret how gut microbiome influenced host and immune cells.
Some controversy remains to be clarified regarding whether the treatment of irAEs might disrupt the efficacy of immunotherapy.For instance, non-small cell lung cancer patients who develop G3 or G4 pneumonitis and consequently receive high-dose corticosteroids for at least 4-6 weeks tend to have a worse prognosis.Conversely, patients with melanoma who discontinued treatment due to irAEs and utilized immune suppressant agents did not exhibit distinct outcome [64][65][66].With the identification of those predictive biomarkers of irAEs, including gut microbiota biomarkers, it becomes possible to greatly enhance efficacy while mitigating irAEs [67,68].
Our study also has certain limitations, primarily related to the study's size and various factors confounding the omics data used for analysis.To gain a more comprehensive understanding of the biological markers associated with irAEs, it is important to improve the experimental design by controlling unnecessary confounding factors for more relevant clinical data.This will enable us to delve deeper into the complex mechanisms underlying the gut microbiome and the occurrence of irAEs [69].Furthermore, it is essential to bolster our findings with in vivo and in vitro experiments.The validation of menaquinone's functionality could be further substantiated through additional preclinical modeling.

Conclusion
Our study underscores the predictive potential of microbial biomarkers for irAE onset.Menaquinone, derived from the microbiome, emerges as a promising therapeutic agent to modulate irAE occurrence.
test for species filtering between with blocking on the 'Study' .Important species (all P value < 0.1) compared between irAEs and non-irAEs and between responders and non-responders, were showcased in the data sheet, respectively.Table S7.Fourteen Species and relative abundance for model classifier.Table S8.Wilcox-rank sum test for pathway from PICRUST2 output.Differential pathways (FDR < 0.005) were showcased in the data sheet.Table S9.Wilcox-rank sum test for EC number from PICRUST2 output.Differential EC (FDR < 0.005) were showcased in the data sheet.Table S10.Differential genes (FDR < 0.5) from the colon tissue RNA sequencing (N = 9) were showcased in the data sheet.Table S11.Differential KEGG pathways (FDR < 0.05) analyzed by gene set enrichment analysis (GSEA).

Fig. 1
Fig. 1 Study characteristics and microbiota composition difference in different irAEs status.A Study characteristics and demographic proportion of irAEs.B Microbiota composition (Phylum level) in the different studies of individuals.C Microbiota composition (Phylum level) in the different irAEs status analyzed with all cohorts demonstrated in 1A.D Visualization of differential microbiota in anti-CTLA-4 cohorts (Dubin and Chaput) by volcano plot.E Visualization of differential microbiota in anti-PD-1/PD-L1 cohorts (Zhang, Hakozaki, Chau, and McCulloch) by volcano plot

Fig. 2 Fig. 3
Fig.2Alterations of gut microbial composition with different irAEs status among three anti-PD-1/PD-L1 cohorts.A Variance explained by irAEs status (irAEs versus non-irAEs) is plotted against variance explained by study effects for individual ASVs.The significantly differential ASVs are colored in red and the dot size is proportional to the abundance of each ASV.P values were from a two-sided blocked Wilcoxon rank-sum test.B Relative abundance of bacterial phyla in irAEs and non-irAEs across all three different studies.C Alpha diversity analysis calculated with Fisher alpha, Richness, Shannon, and Simpson indexes.D Principal coordinate analysis of samples (irAEs: n = 77; non-irAEs: n = 113) from all three anti-PD-1 studies based on Bray-Curtis distance.P values of beta diversity based on Bray-Curtis distance were calculated with PERMANOVA.The study is color-coded and the group (irAEs and non-irAEs) is indicated by different shapes.The upper-right and the bottom-left boxplots illustrate that samples projected onto the first two principal coordinates broken down by study and disease status, respectively.P values of the first and second principal components were calculated with a two-sided Kruskal-Wallis test for study and group.All boxplots represent the 25th-75th percentile of the distribution; the median is shown in a thick line at the middle of the box; the whiskers extend up to values within 1.5 times of IQR, and outliers are represented as dots.The anti-PD1/PD-L1 cohorts comprising over 50 patients, such as those led byZhang et  al., Hakozaki et al., and McCulloch et al., were utilized for the analysis in this part

Fig. 4
Fig. 4 Identification of minimal features for the best model performance and multiple metrics for model evaluation.A Average AUC of ten-fold Random Forest cross-validation, study-to-study transfer validation classifiers, and LOSO validation for non-irAEs versus irAEs with a different number of features.B Comparison of RF score distributions calculated by RF14 classifier between non-irAEs (n = 113) and irAEs (n = 77) groups.Two-sided P values were calculated using the Wilcoxon rank-sum test.C Performance measurements of RF14 classifier illustrated by sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).D The comparison of microbiome model specificity between irAEs and response.The anti-PD1/PD-L1 cohorts comprising over 50 patients, such as those led by Zhang et al., Hakozaki et al., and McCulloch et al., were utilized for the analysis in this part

Fig. 5
Fig.5 External validation cohorts and evaluation metrics for microbiome model.A Receiver operating characteristic (ROC) curves for the validation of microbiome model using SH amplicon cohort (n = 65).B Statistical analysis was conducted based on the predictive value and actual value of irAEs using the optimal thresholds of RF score defined in the training cohort from SH amplicon cohort (n = 65), chi-square test.C Performance measurements of RF14 classifier for SH cohort illustrated by sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).D Receiver operating characteristic (ROC) curves for the validation of microbiome model using JS metagenome cohort (n = 50).E Statistical analysis was conducted based on the predictive value and actual value of irAEs using the cut-off value of RF score defined in the training cohort from JS metagenome cohort (n = 50), chi-square test.F Performance measurements of RF14 classifier for JS cohort illustrated by sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV)